Skip to content

Conversation

@skylarmb
Copy link

@skylarmb skylarmb commented Dec 12, 2025

Summary

This PR introduces dual client generation for v0.x and v1.x APIs, along with significant infrastructure and documentation improvements.

Key Changes

Client Generation

  • Support for generating both v0.x and v1.x API clients from OpenAPI specs
  • Pydantic v2 compatibility for generated models

Development Environment

  • Nix/direnv configuration for reproducible development environments
  • Pre-commit hooks and quality gates

Documentation & Quality

  • Comprehensive documentation restructuring and improvements
  • Mermaid diagram dual-theme support
  • Agent OS quality framework and standards
  • Increased test coverage thresholds (80% project-wide, 70% per-file)

Integrations

  • Google ADK instrumentor support
  • OpenInference MCP instrumentor integration
  • OpenLLMetry (Traceloop) instrumentor support
  • Optional dependency groups for all instrumentors

CI/Infrastructure

  • Improved workflow triggers and branching strategy
  • Lambda/Docker compatibility testing
  • Documentation preview builds

Known CI Issues

  • Generated Code Check: Generated code drift needs regeneration (make generate-v0-client && make generate-v1-client)
  • Lambda Compatibility Suite: Test uses invalid event_type="lambda" (must be: session, model, tool, chain)
  • Integration Tests: External API quota/credential issues (OpenAI rate limits, missing Anthropic key)

Review & Testing Checklist for Human

  • Review generated client code for correctness (src/honeyhive/_v0/ and src/honeyhive/_v1/)
  • Verify backwards compatibility - existing v0 API usage should continue to work
  • Run make generate-v0-client && make generate-v1-client and verify no unexpected changes
  • Test instrumentor integrations work correctly with their respective providers
  • Review documentation changes for accuracy

Recommended test plan:

  1. Run direnv exec . pytest tests/ -v to verify test suite passes locally
  2. Test a sample application using the v0 API to confirm backwards compatibility
  3. Review the generated client diffs carefully given the large number of changes

Notes

- Add 'generate-v0-client' Makefile target using datamodel-code-generator
- Pin datamodel-code-generator to v0.43.0 for stable Pydantic v2 compatibility
- Add pre-commit as pip dev dependency instead of Nix package
- Remove pkgs.pre-commit from flake.nix buildInputs (install via pip instead)
- Update PYTHONPATH in flake.nix to include 'src' directory
- Regenerate models with Pydantic v2 syntax (RootModel, StrEnum, Annotated)
- Fix Python environment isolation: use single Python 3.12 venv instead of mixing with Nix Python 3.13
- All development tools now installed from same venv, eliminating version conflicts

✨ Created with OpenCode
@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #163

@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #163

@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #163

@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #163

skylarmb and others added 2 commits December 11, 2025 22:21
Phase 1 of the v1 migration plan. Reorganizes the codebase to support
both v0.x and v1.x SDK versions from a single repository:

- Move api/ and models/ into src/honeyhive/_v0/
- Create public facades at api/__init__.py and models/__init__.py
- Add backwards-compat shims preserving deep import paths
- Update test mock paths to target _v0 module locations

This enables future v1 client generation into _v1/ with build-time
exclusion for separate PyPI packages.

✨ Created with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change @trace(name=...) to @trace(event_name=...) to match v0 API
- Change arbitrary kwargs to metadata={} dict to match v0 API
- Change tracer.tracer_id to tracer._tracer_id (private attribute)

These tests are new and should validate backwards compatibility with
the v0 client API, not introduce new API expectations.

Co-Authored-By: skylar@honeyhive.ai <skylarmb@gmail.com>
@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #163

skylarmb and others added 6 commits December 11, 2025 22:29
- Move openapi.yaml to openapi/v0.yaml
- Create minimal openapi/v1.yaml with single endpoint for testing
- Update generate_v0_models.py to use new v0 spec path
- Update generate_models_and_client.py to look for v1 spec first

✨ Created with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add scripts/generate_v1_client.py using openapi-python-client
- Add 'make generate-v1-client' Makefile target
- Generate initial _v1/ client from minimal openapi/v1.yaml spec

The v1 client uses attrs + httpx (via openapi-python-client) and will be
excluded from v0.x builds. Currently includes only /session/start endpoint
for testing the generation pipeline.

✨ Created with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a new CI job that regenerates both v0 and v1 clients and fails if
there are any uncommitted changes. This ensures generated code stays
in sync with OpenAPI specs.

Also updates path triggers to include openapi/ and scripts/generate_*.py.

✨ Created with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change tracer.tracer_id to tracer._tracer_id to match v0 API
which never exposed a public tracer_id attribute.

Co-Authored-By: skylar@honeyhive.ai <skylarmb@gmail.com>
@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #163

@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #163

…rfile

The Docker/Lambda CI tests were failing because the Dockerfile.bundle-builder
was missing pydantic-settings and attrs dependencies that honeyhive requires.

Also updates TODO.md to document resolved test fixes and remaining
implementation issues.

Co-Authored-By: skylar@honeyhive.ai <skylarmb@gmail.com>
skylarmb and others added 9 commits December 12, 2025 15:55
- Delete old api/ modules (configurations.py, datasets.py, etc.)
- Delete old models/ (generated.py, tracing.py)
- Generate full client from v1 OpenAPI spec
- Create ergonomic wrapper in api/client.py with all resource APIs:
  - configurations, datapoints, datasets, events, experiments
  - metrics, projects, sessions, tools
- Re-export all models via models/__init__.py
- Simplify honeyhive/__init__.py to just export HoneyHive client

Usage:
  from honeyhive import HoneyHive
  from honeyhive.models import CreateConfigurationRequest

  client = HoneyHive(api_key="hh_...")
  client.configurations.list(project="my-project")

✨ Created with Claude Code
Each API class now has both sync and async variants:
- list() / list_async()
- create() / create_async()
- update() / update_async()
- delete() / delete_async()
- etc.

Usage:
  # Sync
  configs = client.configurations.list()

  # Async
  configs = await client.configurations.list_async()

✨ Created with Claude Code
- Replace CreateRunRequest with PostExperimentRunRequest (actual generated model)
- Replace EvaluationRun with ExperimentResultSummary (from experiments.models)
- Remove invalid import from honeyhive.models.generated
- Create ExperimentRun alias from ExperimentResultSummary
- Add models/tracing.py with TracingParams for tracer compatibility

Document OpenAPI schema gaps:
- GET /runs/{run_id}/result returns TODOSchema (incomplete)
- GET /runs/compare-with returns TODOSchema (incomplete)
- Added to TODO.md Category 6 for backend team action

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Remove imports of v0 models (SessionStartRequest, PostConfigurationRequest, etc)
- Update test helper functions to return dicts instead of model instances
- Tests now use v1 API patterns for request/response objects

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Remove honeyhive.models.generated imports from 16 test files
- Replace v0 enum usages with string literal values:
  - CallType.chat → "chat"
  - EventType1.model → "model"
  - Operator.is_ → "is"
  - Type.string → "string"
  - Status.completed → "completed"
  - ReturnType.float → "float"
  - Type1.PYTHON → "PYTHON"
  - Type3.function → "function"
  - And all other enum variants

Test files updated:
- Unit tests: test_api_events, test_api_configurations, test_api_metrics,
  test_api_tools, test_api_evaluations, test_api_workflows,
  test_models_generated, test_models_integration, test_tracer_core_operations
- Integration tests: test_api_clients_integration, test_end_to_end_validation,
  test_model_integration, test_simple_integration,
  test_v1_immediate_ship_requirements
- Utilities: validation_helpers, backend_verification

All 16 files pass Python syntax validation ✅

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Remove SessionStartRequest and CreateEventRequest imports from honeyhive.models.generated
- Import CreateDatapointRequest and PostConfigurationRequest from honeyhive.models
- Update type hints to use Dict[str, Any] instead of removed v0 model types
- Functions already accept dict-based API responses from v1

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Remove unused PostConfigurationRequest import
- Use Dict[str, Any] type hint for config_request parameter
- All validation functions now accept dict-based API arguments

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Archived 15 v0-specific unit tests (test_api_*.py, test_models_*.py)
- Added pytest ignore rule to skip archived tests
- These tests relied on v0 API architecture (BaseAPI, ConfigurationsAPI, etc.)
- v1 uses auto-generated httpx client with ergonomic wrapper instead
- Created README explaining archive rationale
- Integration tests provide comprehensive v1 API coverage

✨ Created with Claude Code
skylarmb and others added 20 commits December 15, 2025 13:01
Add comprehensive documentation about endpoints returning Dict[str, Any]
instead of typed Pydantic models due to incomplete OpenAPI specs:

- Events service: All endpoints (createEvent, getEvents, createModelEvent, etc.)
- Session service: startSession endpoint
- Datapoints service: getDatapoint endpoint
- Projects service: TODOSchema placeholders

Impact:
- Explains why _get_field() workaround helper exists
- Documents long-term fix plan (OpenAPI spec + regeneration)
- Phase 1-3 roadmap for resolving the issue
- References in code and todo tracking

Files:
- New: UNTYPED_ENDPOINTS.md - Full analysis and fix plan
- Updated: backend_verification.py - Added documentation link in _get_field()
- Updated: INTEGRATION_TESTS_TODO.md - Added Generated Client Issues section

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Now that the OpenAPI spec has been updated and PostSessionResponse model
was generated, update all code that calls sessions.start() to use the
properly typed model instead of Dict[str, Any].

Changes:
- Import PostSessionResponse in client wrapper (client.py)
- Update SessionsAPI.start() and start_async() return types to PostSessionResponse
- Update backend verification to expect and use PostSessionResponse
- Update validation helpers to use attribute access on typed model
- Update all integration tests to use attribute access (session.session_id instead of session["session_id"])

This provides:
- Full type safety for session responses
- IDE autocomplete support
- Better error catching at development time
- Consistency across the codebase

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
The 4 skipped tests in test_honeyhive_attributes_backend_integration.py
were marked as skipped due to EventType enum removal. However, they've
already been migrated to use string-based event_type values.

Changes:
- Remove skip decorators from all 4 tests
- Remove obsolete test_mode=False parameters (test_mode was removed from SDK)
- Update docstrings to reflect v1 string-based event_type approach
- Tests now properly validate that event_type strings are stored correctly

Tests re-enabled:
1. test_decorator_event_type_backend_verification
2. test_direct_span_event_type_inference
3. test_all_event_types_backend_conversion
4. test_multi_instance_attribute_isolation

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Now that the OpenAPI spec has been updated with proper Event schemas,
update the EventsAPI wrapper to use typed models instead of Dict[str, Any].

Changes:
- Import new Event models: PostEventRequest, PostEventResponse,
  GetEventsResponse, GetEventsBySessionIdResponse, GetEventsChartResponse,
  DeleteEventResponse
- Update list() return type to GetEventsResponse
- Update create() to accept PostEventRequest and return PostEventResponse
- Keep update() and create_batch() as Dict[str, Any] (models not yet available)
- Apply same changes to async methods

This provides:
- Full type safety for list and create operations
- IDE autocomplete and better error catching
- Consistency with generated services

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Update both core tracer components to use the new PostEventRequest model
when calling events.create() instead of passing dicts directly.

Changes:
- Import PostEventRequest in operations.py and span_processor.py
- Wrap event data in PostEventRequest(event=event_data) when calling create()
- This aligns with the new EventsAPI signature and provides type safety

Files updated:
- src/honeyhive/tracer/core/operations.py (line 708)
- src/honeyhive/tracer/processing/span_processor.py (line 813)

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Update test files and model exports to use newly typed Event models
instead of Dict[str, Any] for event operations.

Changes:
- Export Event models from honeyhive.models: PostEventRequest,
  PostEventResponse, GetEventsResponse, DeleteEventResponse, etc.
- tests/utils/backend_verification.py: Update events.list() response
  handling to use typed GetEventsResponse model
- tests/utils/validation_helpers.py: Wrap event creation in
  PostEventRequest and handle typed PostEventResponse
- tests/integration/test_end_to_end_validation.py: Update event list
  response handling to use typed GetEventsResponse

Benefits:
- Full type safety for all event API operations
- IDE autocomplete for response fields
- Cleaner, more maintainable test code
- Eliminates dict vs object branching logic

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Update events.create() and events.list() calls to use the newly typed
PostEventRequest, PostEventResponse, and GetEventsResponse models.

Changes:
- Import PostEventRequest, PostEventResponse, GetEventsResponse
- Wrap event data in PostEventRequest when creating events
- Update response checking to use typed models with isinstance()
- Access response fields via attributes instead of dict keys
- Fix events.list() to use data dict with correct parameter format
- Update event iteration to use typed Event attributes

This maintains test functionality while providing full type safety and
IDE autocomplete for event operations.

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
We changed the field from 'datapoints' to 'datasets' based on spec changes,
but without being able to verify against a working backend (endpoints return 404).

This commit reverts the change to keep the original 'datapoints' field name
until we can confirm what the actual API returns. Using unverified spec
changes risks breaking against real backend responses.

Changes:
- Revert GetDatasetsResponse.datapoints field in models
- Revert openapi/v1.yaml GetDatasetsResponse schema
- Update test_datasets_api.py to reference .datapoints instead of .datasets

✨ Created with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Fixed multiple integration test issues and corrected OpenAPI specification:

**OpenAPI Spec Fixes:**
- Fixed naming collision: renamed /v1/events/export operationId from 'getEvents' to 'exportEvents'
- Prevents function name collision in generated code

**SDK Fixes:**
- Added parameter filtering in EventsAPI.list() to only pass supported params
- Fixed experiments.core get_run_result() missing project_id argument
- Regenerated client code with corrected spec

**Test Fixes:**
- Fixed ValidationError imports (Pydantic v2 uses ValidationError not ValueError)
- Fixed configuration response field names (insertedId not inserted_id)
- Fixed datapoint validation logic in validation_helpers.py
- Added skip decorators for tests blocked by missing backend endpoints

**Documented Backend Issues:**
- Updated INTEGRATION_TESTS_TODO.md with all blocked endpoints
- Missing: GET /v1/events, POST /v1/session/start
- Broken: TODOSchema validation, metrics API, projects API

**Test Results:**
- test_model_integration.py: 6/6 passing ✅
- test_datasets_api.py: 4/7 passing (3 skipped)
- test_simple_integration.py: 6/7 passing (1 blocked)
- test_end_to_end_validation.py: 1/4 passing (3 skipped)

✨ Created with Claude Code
- Unskip and implement test_get_datapoint, test_update_datapoint, test_delete_datapoint
- Add proper imports for UpdateDatapointRequest/Response, DeleteDatapointResponse
- Fix response parsing for get endpoint (returns {datapoint: [...]})
- Unskip test_create_tool (works now)
- Document client bugs in tools API (delete passes tool_id, service expects function_id)
- Keep bulk_operations skipped (not implemented)

5 datapoints tests now passing, 1 tools test passing.

✨ Created with Claude Code
Configurations API (4 passing, 1 skipped):
- Unskip test_create_configuration, test_list_configurations,
  test_update_configuration, test_delete_configuration
- Keep test_get_configuration skipped (v1 API has no get method)
- Account for eventual consistency in list operations

Datasets API (5 passing, 2 skipped):
- Unskip test_delete_dataset with proper implementation
- Keep include_datapoints and update tests skipped (backend issues)

Total API tests: 15 passing, 21 skipped (was mostly skipped before)

✨ Created with Claude Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants